Overview

Dataset statistics

Number of variables13
Number of observations1460
Missing cells0
Missing cells (%)0.0%
Duplicate rows1
Duplicate rows (%)0.1%
Total size in memory148.4 KiB
Average record size in memory104.1 B

Variable types

NUM13

Warnings

Dataset has 1 (0.1%) duplicate rows Duplicates
MiscVal is highly skewed (γ1 = 24.47679419) Skewed
2ndFlrSF has 829 (56.8%) zeros Zeros
LowQualFinSF has 1434 (98.2%) zeros Zeros
WoodDeckSF has 761 (52.1%) zeros Zeros
OpenPorchSF has 656 (44.9%) zeros Zeros
EnclosedPorch has 1252 (85.8%) zeros Zeros
3SsnPorch has 1436 (98.4%) zeros Zeros
ScreenPorch has 1344 (92.1%) zeros Zeros
PoolArea has 1453 (99.5%) zeros Zeros
MiscVal has 1408 (96.4%) zeros Zeros

Reproduction

Analysis started2020-11-06 21:34:56.059678
Analysis finished2020-11-06 21:35:19.443380
Duration23.38 seconds
Software versionpandas-profiling v2.9.0
Download configurationconfig.yaml

Variables

LotArea
Real number (ℝ≥0)

Distinct1073
Distinct (%)73.5%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean10516.82808
Minimum1300
Maximum215245
Zeros0
Zeros (%)0.0%
Memory size11.5 KiB

Quantile statistics

Minimum1300
5-th percentile3311.7
Q17553.5
median9478.5
Q311601.5
95-th percentile17401.15
Maximum215245
Range213945
Interquartile range (IQR)4048

Descriptive statistics

Standard deviation9981.264932
Coefficient of variation (CV)0.949075601
Kurtosis203.243271
Mean10516.82808
Median Absolute Deviation (MAD)1998
Skewness12.20768785
Sum15354569
Variance99625649.65
MonotocityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
7200251.7%
 
9600241.6%
 
6000171.2%
 
10800141.0%
 
9000141.0%
 
8400141.0%
 
1680100.7%
 
750090.6%
 
812580.5%
 
910080.5%
 
612080.5%
 
624080.5%
 
318270.5%
 
780060.4%
 
845060.4%
 
1000050.3%
 
450050.3%
 
443550.3%
 
500050.3%
 
1014050.3%
 
975050.3%
 
1040050.3%
 
540050.3%
 
701840.3%
 
1170040.3%
 
Other values (1048)123484.5%
 
ValueCountFrequency (%) 
130010.1%
 
147710.1%
 
149110.1%
 
152610.1%
 
153320.1%
 
159610.1%
 
1680100.7%
 
186910.1%
 
189020.1%
 
192010.1%
 
ValueCountFrequency (%) 
21524510.1%
 
16466010.1%
 
15900010.1%
 
11514910.1%
 
7076110.1%
 
6388710.1%
 
5720010.1%
 
5350410.1%
 
5322710.1%
 
5310710.1%
 

1stFlrSF
Real number (ℝ≥0)

Distinct753
Distinct (%)51.6%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1162.626712
Minimum334
Maximum4692
Zeros0
Zeros (%)0.0%
Memory size11.5 KiB

Quantile statistics

Minimum334
5-th percentile672.95
Q1882
median1087
Q31391.25
95-th percentile1831.25
Maximum4692
Range4358
Interquartile range (IQR)509.25

Descriptive statistics

Standard deviation386.587738
Coefficient of variation (CV)0.3325123481
Kurtosis5.745841482
Mean1162.626712
Median Absolute Deviation (MAD)234.5
Skewness1.376756622
Sum1697435
Variance149450.0792
MonotocityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
864251.7%
 
1040161.1%
 
912141.0%
 
848120.8%
 
894120.8%
 
672110.8%
 
81690.6%
 
63090.6%
 
93670.5%
 
96070.5%
 
48370.5%
 
83270.5%
 
76460.4%
 
99060.4%
 
72860.4%
 
105660.4%
 
84060.4%
 
88260.4%
 
172860.4%
 
72060.4%
 
79650.3%
 
149450.3%
 
142250.3%
 
52050.3%
 
107250.3%
 
Other values (728)125185.7%
 
ValueCountFrequency (%) 
33410.1%
 
37210.1%
 
43810.1%
 
48010.1%
 
48370.5%
 
49510.1%
 
52050.3%
 
52510.1%
 
52610.1%
 
53610.1%
 
ValueCountFrequency (%) 
469210.1%
 
322810.1%
 
313810.1%
 
289810.1%
 
263310.1%
 
252410.1%
 
251510.1%
 
244410.1%
 
241110.1%
 
240210.1%
 

2ndFlrSF
Real number (ℝ≥0)

ZEROS

Distinct417
Distinct (%)28.6%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean346.9924658
Minimum0
Maximum2065
Zeros829
Zeros (%)56.8%
Memory size11.5 KiB

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q3728
95-th percentile1141.05
Maximum2065
Range2065
Interquartile range (IQR)728

Descriptive statistics

Standard deviation436.5284359
Coefficient of variation (CV)1.258034335
Kurtosis-0.5534635576
Mean346.9924658
Median Absolute Deviation (MAD)0
Skewness0.8130298163
Sum506609
Variance190557.0753
MonotocityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
082956.8%
 
728100.7%
 
50490.6%
 
67280.5%
 
54680.5%
 
72070.5%
 
60070.5%
 
89660.4%
 
78050.3%
 
86250.3%
 
68950.3%
 
84050.3%
 
75650.3%
 
70240.3%
 
73940.3%
 
55140.3%
 
74140.3%
 
87840.3%
 
80440.3%
 
67030.2%
 
66030.2%
 
125430.2%
 
79330.2%
 
66830.2%
 
79530.2%
 
Other values (392)50934.9%
 
ValueCountFrequency (%) 
082956.8%
 
11010.1%
 
16710.1%
 
19210.1%
 
20810.1%
 
21310.1%
 
22010.1%
 
22410.1%
 
24020.1%
 
25220.1%
 
ValueCountFrequency (%) 
206510.1%
 
187210.1%
 
181810.1%
 
179610.1%
 
161110.1%
 
158910.1%
 
154010.1%
 
153810.1%
 
152310.1%
 
151910.1%
 

LowQualFinSF
Real number (ℝ≥0)

ZEROS

Distinct24
Distinct (%)1.6%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean5.844520548
Minimum0
Maximum572
Zeros1434
Zeros (%)98.2%
Memory size11.5 KiB

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q30
95-th percentile0
Maximum572
Range572
Interquartile range (IQR)0

Descriptive statistics

Standard deviation48.62308143
Coefficient of variation (CV)8.319430317
Kurtosis83.23481667
Mean5.844520548
Median Absolute Deviation (MAD)0
Skewness9.011341288
Sum8533
Variance2364.204048
MonotocityNot monotonic
Histogram with fixed size bins (bins=24)
ValueCountFrequency (%) 
0143498.2%
 
8030.2%
 
36020.1%
 
52810.1%
 
5310.1%
 
12010.1%
 
14410.1%
 
15610.1%
 
20510.1%
 
23210.1%
 
23410.1%
 
37110.1%
 
57210.1%
 
39010.1%
 
39210.1%
 
39710.1%
 
42010.1%
 
47310.1%
 
47910.1%
 
48110.1%
 
51310.1%
 
51410.1%
 
51510.1%
 
38410.1%
 
ValueCountFrequency (%) 
0143498.2%
 
5310.1%
 
8030.2%
 
12010.1%
 
14410.1%
 
15610.1%
 
20510.1%
 
23210.1%
 
23410.1%
 
36020.1%
 
ValueCountFrequency (%) 
57210.1%
 
52810.1%
 
51510.1%
 
51410.1%
 
51310.1%
 
48110.1%
 
47910.1%
 
47310.1%
 
42010.1%
 
39710.1%
 

GrLivArea
Real number (ℝ≥0)

Distinct861
Distinct (%)59.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1515.463699
Minimum334
Maximum5642
Zeros0
Zeros (%)0.0%
Memory size11.5 KiB

Quantile statistics

Minimum334
5-th percentile848
Q11129.5
median1464
Q31776.75
95-th percentile2466.1
Maximum5642
Range5308
Interquartile range (IQR)647.25

Descriptive statistics

Standard deviation525.4803834
Coefficient of variation (CV)0.3467456092
Kurtosis4.895120581
Mean1515.463699
Median Absolute Deviation (MAD)326
Skewness1.366560356
Sum2212577
Variance276129.6334
MonotocityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
864221.5%
 
1040141.0%
 
894110.8%
 
848100.7%
 
1456100.7%
 
91290.6%
 
120090.6%
 
81680.5%
 
109280.5%
 
134470.5%
 
172870.5%
 
98770.5%
 
105660.4%
 
122460.4%
 
176860.4%
 
149460.4%
 
148460.4%
 
63060.4%
 
114450.3%
 
131450.3%
 
96050.3%
 
125250.3%
 
171050.3%
 
139250.3%
 
98850.3%
 
Other values (836)126786.8%
 
ValueCountFrequency (%) 
33410.1%
 
43810.1%
 
48010.1%
 
52010.1%
 
60510.1%
 
61610.1%
 
63060.4%
 
67220.1%
 
69110.1%
 
69310.1%
 
ValueCountFrequency (%) 
564210.1%
 
467610.1%
 
447610.1%
 
431610.1%
 
362710.1%
 
360810.1%
 
349310.1%
 
344710.1%
 
339510.1%
 
327910.1%
 

WoodDeckSF
Real number (ℝ≥0)

ZEROS

Distinct274
Distinct (%)18.8%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean94.24452055
Minimum0
Maximum857
Zeros761
Zeros (%)52.1%
Memory size11.5 KiB

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q3168
95-th percentile335
Maximum857
Range857
Interquartile range (IQR)168

Descriptive statistics

Standard deviation125.3387944
Coefficient of variation (CV)1.329931901
Kurtosis2.992950925
Mean94.24452055
Median Absolute Deviation (MAD)0
Skewness1.541375757
Sum137597
Variance15709.81337
MonotocityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
076152.1%
 
192382.6%
 
100362.5%
 
144332.3%
 
120312.1%
 
168281.9%
 
140151.0%
 
224141.0%
 
240100.7%
 
208100.7%
 
21690.6%
 
18080.5%
 
16080.5%
 
25060.4%
 
13260.4%
 
26460.4%
 
14360.4%
 
9660.4%
 
15660.4%
 
17150.3%
 
4850.3%
 
19650.3%
 
10550.3%
 
28850.3%
 
21050.3%
 
Other values (249)39326.9%
 
ValueCountFrequency (%) 
076152.1%
 
1220.1%
 
2420.1%
 
2620.1%
 
2820.1%
 
3010.1%
 
3210.1%
 
3310.1%
 
3510.1%
 
3640.3%
 
ValueCountFrequency (%) 
85710.1%
 
73610.1%
 
72810.1%
 
67010.1%
 
66810.1%
 
63510.1%
 
58610.1%
 
57610.1%
 
57410.1%
 
55010.1%
 

OpenPorchSF
Real number (ℝ≥0)

ZEROS

Distinct202
Distinct (%)13.8%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean46.66027397
Minimum0
Maximum547
Zeros656
Zeros (%)44.9%
Memory size11.5 KiB

Quantile statistics

Minimum0
5-th percentile0
Q10
median25
Q368
95-th percentile175.05
Maximum547
Range547
Interquartile range (IQR)68

Descriptive statistics

Standard deviation66.25602768
Coefficient of variation (CV)1.419966538
Kurtosis8.490335806
Mean46.66027397
Median Absolute Deviation (MAD)25
Skewness2.36434174
Sum68124
Variance4389.861203
MonotocityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
065644.9%
 
36292.0%
 
48221.5%
 
20211.4%
 
40191.3%
 
45191.3%
 
30161.1%
 
24161.1%
 
60151.0%
 
39141.0%
 
28141.0%
 
44130.9%
 
50130.9%
 
54130.9%
 
72120.8%
 
98110.8%
 
63110.8%
 
35110.8%
 
32110.8%
 
75100.7%
 
42100.7%
 
120100.7%
 
96100.7%
 
6490.6%
 
6690.6%
 
Other values (177)46631.9%
 
ValueCountFrequency (%) 
065644.9%
 
410.1%
 
810.1%
 
1010.1%
 
1110.1%
 
1230.2%
 
1510.1%
 
1680.5%
 
1720.1%
 
1850.3%
 
ValueCountFrequency (%) 
54710.1%
 
52310.1%
 
50210.1%
 
41810.1%
 
40610.1%
 
36410.1%
 
34110.1%
 
31910.1%
 
31220.1%
 
30410.1%
 

EnclosedPorch
Real number (ℝ≥0)

ZEROS

Distinct120
Distinct (%)8.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean21.95410959
Minimum0
Maximum552
Zeros1252
Zeros (%)85.8%
Memory size11.5 KiB

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q30
95-th percentile180.15
Maximum552
Range552
Interquartile range (IQR)0

Descriptive statistics

Standard deviation61.1191486
Coefficient of variation (CV)2.783950237
Kurtosis10.43076594
Mean21.95410959
Median Absolute Deviation (MAD)0
Skewness3.089871904
Sum32053
Variance3735.550326
MonotocityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
0125285.8%
 
112151.0%
 
9660.4%
 
12050.3%
 
14450.3%
 
19250.3%
 
21650.3%
 
25240.3%
 
11640.3%
 
15640.3%
 
12630.2%
 
22830.2%
 
12830.2%
 
18430.2%
 
10230.2%
 
15030.2%
 
4030.2%
 
17630.2%
 
16430.2%
 
7720.1%
 
18520.1%
 
8020.1%
 
18020.1%
 
8420.1%
 
16020.1%
 
Other values (95)1167.9%
 
ValueCountFrequency (%) 
0125285.8%
 
1910.1%
 
2010.1%
 
2410.1%
 
3010.1%
 
3220.1%
 
3420.1%
 
3620.1%
 
3710.1%
 
3920.1%
 
ValueCountFrequency (%) 
55210.1%
 
38610.1%
 
33010.1%
 
31810.1%
 
30110.1%
 
29410.1%
 
29310.1%
 
29110.1%
 
28610.1%
 
28010.1%
 

3SsnPorch
Real number (ℝ≥0)

ZEROS

Distinct20
Distinct (%)1.4%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean3.409589041
Minimum0
Maximum508
Zeros1436
Zeros (%)98.4%
Memory size11.5 KiB

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q30
95-th percentile0
Maximum508
Range508
Interquartile range (IQR)0

Descriptive statistics

Standard deviation29.31733056
Coefficient of variation (CV)8.598493896
Kurtosis123.6623794
Mean3.409589041
Median Absolute Deviation (MAD)0
Skewness10.30434203
Sum4978
Variance859.505871
MonotocityNot monotonic
Histogram with fixed size bins (bins=20)
ValueCountFrequency (%) 
0143698.4%
 
16830.2%
 
21620.1%
 
14420.1%
 
18020.1%
 
24510.1%
 
23810.1%
 
29010.1%
 
19610.1%
 
18210.1%
 
40710.1%
 
30410.1%
 
16210.1%
 
15310.1%
 
32010.1%
 
14010.1%
 
13010.1%
 
9610.1%
 
2310.1%
 
50810.1%
 
ValueCountFrequency (%) 
0143698.4%
 
2310.1%
 
9610.1%
 
13010.1%
 
14010.1%
 
14420.1%
 
15310.1%
 
16210.1%
 
16830.2%
 
18020.1%
 
ValueCountFrequency (%) 
50810.1%
 
40710.1%
 
32010.1%
 
30410.1%
 
29010.1%
 
24510.1%
 
23810.1%
 
21620.1%
 
19610.1%
 
18210.1%
 

ScreenPorch
Real number (ℝ≥0)

ZEROS

Distinct76
Distinct (%)5.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean15.0609589
Minimum0
Maximum480
Zeros1344
Zeros (%)92.1%
Memory size11.5 KiB

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q30
95-th percentile160
Maximum480
Range480
Interquartile range (IQR)0

Descriptive statistics

Standard deviation55.75741528
Coefficient of variation (CV)3.70211589
Kurtosis18.43906784
Mean15.0609589
Median Absolute Deviation (MAD)0
Skewness4.122213743
Sum21989
Variance3108.889359
MonotocityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
0134492.1%
 
19260.4%
 
22450.3%
 
12050.3%
 
18940.3%
 
18040.3%
 
16030.2%
 
16830.2%
 
14430.2%
 
12630.2%
 
14730.2%
 
9030.2%
 
20020.1%
 
19820.1%
 
21620.1%
 
18420.1%
 
25920.1%
 
10020.1%
 
17620.1%
 
17020.1%
 
28820.1%
 
14220.1%
 
15310.1%
 
15410.1%
 
15210.1%
 
Other values (51)513.5%
 
ValueCountFrequency (%) 
0134492.1%
 
4010.1%
 
5310.1%
 
6010.1%
 
6310.1%
 
8010.1%
 
9030.2%
 
9510.1%
 
9910.1%
 
10020.1%
 
ValueCountFrequency (%) 
48010.1%
 
44010.1%
 
41010.1%
 
39610.1%
 
38510.1%
 
37410.1%
 
32210.1%
 
31210.1%
 
29110.1%
 
28820.1%
 

PoolArea
Real number (ℝ≥0)

ZEROS

Distinct8
Distinct (%)0.5%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean2.75890411
Minimum0
Maximum738
Zeros1453
Zeros (%)99.5%
Memory size11.5 KiB

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q30
95-th percentile0
Maximum738
Range738
Interquartile range (IQR)0

Descriptive statistics

Standard deviation40.17730694
Coefficient of variation (CV)14.56277759
Kurtosis223.2684989
Mean2.75890411
Median Absolute Deviation (MAD)0
Skewness14.82837364
Sum4028
Variance1614.215993
MonotocityNot monotonic
Histogram with fixed size bins (bins=8)
ValueCountFrequency (%) 
0145399.5%
 
73810.1%
 
64810.1%
 
57610.1%
 
55510.1%
 
51910.1%
 
51210.1%
 
48010.1%
 
ValueCountFrequency (%) 
0145399.5%
 
48010.1%
 
51210.1%
 
51910.1%
 
55510.1%
 
57610.1%
 
64810.1%
 
73810.1%
 
ValueCountFrequency (%) 
73810.1%
 
64810.1%
 
57610.1%
 
55510.1%
 
51910.1%
 
51210.1%
 
48010.1%
 
0145399.5%
 

MiscVal
Real number (ℝ≥0)

SKEWED
ZEROS

Distinct21
Distinct (%)1.4%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean43.4890411
Minimum0
Maximum15500
Zeros1408
Zeros (%)96.4%
Memory size11.5 KiB

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q30
95-th percentile0
Maximum15500
Range15500
Interquartile range (IQR)0

Descriptive statistics

Standard deviation496.1230245
Coefficient of variation (CV)11.408001
Kurtosis701.0033423
Mean43.4890411
Median Absolute Deviation (MAD)0
Skewness24.47679419
Sum63494
Variance246138.0554
MonotocityNot monotonic
Histogram with fixed size bins (bins=21)
ValueCountFrequency (%) 
0140896.4%
 
400110.8%
 
50080.5%
 
70050.3%
 
45040.3%
 
200040.3%
 
60040.3%
 
120020.1%
 
48020.1%
 
115010.1%
 
80010.1%
 
1550010.1%
 
62010.1%
 
350010.1%
 
56010.1%
 
250010.1%
 
130010.1%
 
140010.1%
 
35010.1%
 
830010.1%
 
5410.1%
 
ValueCountFrequency (%) 
0140896.4%
 
5410.1%
 
35010.1%
 
400110.8%
 
45040.3%
 
48020.1%
 
50080.5%
 
56010.1%
 
60040.3%
 
62010.1%
 
ValueCountFrequency (%) 
1550010.1%
 
830010.1%
 
350010.1%
 
250010.1%
 
200040.3%
 
140010.1%
 
130010.1%
 
120020.1%
 
115010.1%
 
80010.1%
 

SalePrice
Real number (ℝ≥0)

Distinct663
Distinct (%)45.4%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean180921.1959
Minimum34900
Maximum755000
Zeros0
Zeros (%)0.0%
Memory size11.5 KiB

Quantile statistics

Minimum34900
5-th percentile88000
Q1129975
median163000
Q3214000
95-th percentile326100
Maximum755000
Range720100
Interquartile range (IQR)84025

Descriptive statistics

Standard deviation79442.50288
Coefficient of variation (CV)0.4391000319
Kurtosis6.53628186
Mean180921.1959
Median Absolute Deviation (MAD)38000
Skewness1.88287576
Sum264144946
Variance6311111264
MonotocityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
140000201.4%
 
135000171.2%
 
145000141.0%
 
155000141.0%
 
190000130.9%
 
110000130.9%
 
160000120.8%
 
115000120.8%
 
139000110.8%
 
130000110.8%
 
125000100.7%
 
143000100.7%
 
185000100.7%
 
180000100.7%
 
144000100.7%
 
17500090.6%
 
14700090.6%
 
10000090.6%
 
12700090.6%
 
16500080.5%
 
17600080.5%
 
17000080.5%
 
12900080.5%
 
23000080.5%
 
25000080.5%
 
Other values (638)118981.4%
 
ValueCountFrequency (%) 
3490010.1%
 
3531110.1%
 
3790010.1%
 
3930010.1%
 
4000010.1%
 
5200010.1%
 
5250010.1%
 
5500020.1%
 
5599310.1%
 
5850010.1%
 
ValueCountFrequency (%) 
75500010.1%
 
74500010.1%
 
62500010.1%
 
61165710.1%
 
58293310.1%
 
55658110.1%
 
55500010.1%
 
53800010.1%
 
50183710.1%
 
48500010.1%
 

Interactions

Correlations

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.

Missing values

Sample

First rows

LotArea1stFlrSF2ndFlrSFLowQualFinSFGrLivAreaWoodDeckSFOpenPorchSFEnclosedPorch3SsnPorchScreenPorchPoolAreaMiscValSalePrice
084508568540171006100000208500
196001262001262298000000181500
2112509208660178604200000223500
39550961756017170352720000140000
41426011451053021981928400000250000
514115796566013624030032000700143000
61008416940016942555700000307000
710382110798302090235204228000350200000
861201022752017749002050000129900
9742010770010770400000118000

Last rows

LotArea1stFlrSF2ndFlrSFLowQualFinSFGrLivAreaWoodDeckSFOpenPorchSFEnclosedPorch3SsnPorchScreenPorchPoolAreaMiscValSalePrice
1450900089689601792324500000136000
14519262157800157803600000287090
14523675107200107202800000145000
145317217114000114036560000084500
145475001221001221011300000185000
145579179536940164704000000175000
1456131752073002073349000000210000
14579042118811520234006000002500266500
14589717107800107836601120000142125
1459993712560012567366800000147500

Duplicate rows

Most frequent

LotArea1stFlrSF2ndFlrSFLowQualFinSFGrLivAreaWoodDeckSFOpenPorchSFEnclosedPorch3SsnPorchScreenPorchPoolAreaMiscValSalePricecount
0252297073901709040000001300002